Multi-fidelity regression using a non-parametric relationship

نویسندگان

  • Federico Zertuche
  • Celine Helbert
  • Anestis Antoniadis
چکیده

We study the synthesis of data from different experiments. These experiments are very complex computer simulations that take several hours to produce a response for a given input. Understanding the phenomenon modeled by the simulation requires a large number of responses and in practice having all of them is unfeasible due to time constraints. This is why the computer simulation is often replaced by a simpler probabilistic model, also known as metamodel, that is faster to run. The studied metamodel is based on the hypothesis that the computer simulation is in fact the realization of a gaussian process indexed by the inputs and defined by a parametric mean function and a parametric covariance function. A small number of responses produced by the computer code are used to determine the values of the parameters of the mean and covariance functions. Given a new input, the predicted value is the expectation of the stochastic process at that input conditioned by the responses available. Since the stochastic process is gaussian, there is a formula for this expectation and the error of prediction. When the precision of the output produced by the computer code can be tuned it is possible to incorporate responses with different levels of fidelity to enhance the prediction of the most accurate simulation at a new input while respecting the time constraints. This is usually done by adding several imprecise responses instead of a few precise ones. The main example for this type of computer experiments are the numerical solutions of differential equations. The precision can depend on the size of the mesh of the domain of resolution used to produce the response; on the space where the solution is projected or even weather a part of the physical model involved is left aside. The problem is how to take into account all the information available. This problem has been studied by many authors, most notably by LeGratiet in [1] and by Kennedy and O’Hagan in [3]. In the present work, we propose a new approch that is different from the existing ones. For ease of notation only two precision or fidelity levels are considered: 1 for the least accurate and 2 for the most precise. First we will assume that the most precise level is a function of the least accurate. The difference between the two will be modeled by the gaussian process Z(2,x). If we suppose that Y(1,x) is the gaussian process related to 1, then Y(2,x) defined by equation (1) is also a gaussian process. It will model the outcomes of 2. Y(2,x) = φ(Y(1,x)) + Z(2,x) (1) Generalizing the results in [1], we propose a non-parametric approach where we compute a locally linear approximation of the function φ. We estimate the relationship and build a predictor by using all the responses to compute the conditional expected values for Y(1,x) and Z(2,x). The prediction error is built using the predicting errors of Y(1,x) and Z(2,x). Then, we study an analog model based in [2] where the difference between the two levels is no longer a gaussian process. This time the difference between the two computer simulations will be modeled by MascotNum Annual Conference, April 23-25, 2014, ETH Zürich (Switzerland) Figure 1: Estimated relationship between two successive levels of a computer code that simulates the pressure transient in a porous media. the correlated errors ǫy. The correlation structure of the errors will depend on the distance between the outputs of 1. The new probabilistic model for the second simulator is given by equation (2) where Y(1,x) is still the gaussian process related to 1. Y(2,x) = φ(Y(1,x)) + ǫy (2) Once again we will estimate φ by using locally linear polynomials. Since we considered a particular correlation structure for the errors, we use the algorithm described by Fernandez in [2] to correct the bias in the estimation of the smoothing parameter of the non-parametric regression. Finally, the two models are tested to illustrate their advantages and shortcomings. First by simulating the computer codes as gaussian processes we find that assuming that φ is linear when it is not can affect the results of the predictions. By using physical models we notice that the relationship between two fidelity levels of a computer code can be non-linear as shown in Figure 1 and in some cases not even function-like. Then, we develop briefly a case study related to a diphasic air-water flow in a rectangular domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Regression Modeling for Spherical Data via Non-parametric and Least Square Methods

Introduction Statistical analysis of the data on the Earth's surface was a favorite subject among many researchers. Such data can be related to animal's migration from a region to another position. Then, statistical modeling of their paths helps biological researchers to predict their movements and estimate the areas that are most likely to constitute the presence of the animals. From a geome...

متن کامل

Stochastic Non-Parametric Frontier Analysis

In this paper we develop an approach that synthesizes the best features of the two main methods in the estimation of production efficiency. Specically, our approach first allows for statistical noise, similar to Stochastic frontier analysis, and second, it allows modeling multiple-inputs-multiple-outputs technologies without imposing parametric assumptions on production relationship, similar to...

متن کامل

تاثیر ریسک های اعتباری، عملیاتی و نقدینگی بر کارایی نظام بانکی ایران

This article basically tries to examine the relationship between efficiency and risk in Iranian banking system. The scholars here simply employ two approaches, i.e. parametric (economic-based) approach, and non-parametric (mathematical optimization-based) approach, to assess the efficiency, rank the banks, select the optimal model and at last, identify the impact of credit, operational and liqu...

متن کامل

Semi-parametric Quantile Regression for Analysing Continuous Longitudinal Responses

Recently, quantile regression (QR) models are often applied for longitudinal data analysis. When the distribution of responses seems to be skew and asymmetric due to outliers and heavy-tails, QR models may work suitably. In this paper, a semi-parametric quantile regression model is developed for analysing continuous longitudinal responses. The error term's distribution is assumed to be Asymmetr...

متن کامل

Use of Two Smoothing Parameters in Penalized Spline Estimator for Bi-variate Predictor Non-parametric Regression Model

Penalized spline criteria involve the function of goodness of fit and penalty, which in the penalty function contains smoothing parameters. It serves to control the smoothness of the curve that works simultaneously with point knots and spline degree. The regression function with two predictors in the non-parametric model will have two different non-parametric regression functions. Therefore, we...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017